Molecular Epidemiological Characteristics of Mycobacterium abscessus Complex Derived from Non-Cystic Fibrosis Patients in Japan and Taiwan

ABSTRACT Mycobacterium abscessus complex (MABC) is a group of emerging, highly antimicrobial-resistant non-tuberculous mycobacteria. Specific MABC clones are spreading globally in patients with cystic fibrosis (CF); however, associated genomic epidemiology is lacking in East Asia, with very few patients with CF. Here, we investigated MABC populations derived from non-CF patients in Japan and Taiwan. Analysis of whole-genome sequencing data of 220 MABC isolates revealed that 112, 105, and 3 were M. abscessus subsp. abscessus (ABS), M. abscessus subsp. massiliense (MAS), and M. abscessus subsp. bolletii (BOL), respectively. Moreover, >50% of ABS and >70% of MAS were related to four predominant clones in the region. Known mutations conferring macrolide resistance were rare (1.4%) and were not enriched in the predominant clones. Conversely, the macrolide-susceptible erm(41) T28C mutation was significantly enriched in one predominant ABS clone. The most predominant ABS clone was genetically related to the previously described dominant circulating clone (DCC)1 in patients with CF, whereas no isolates were related to DCC2; isolates related to DCC3 were not necessarily predominant in our sample set. We found that the erm(41) T28C mutants spread globally, and some of them reacquired the functional erm(41) gene through both point mutation and recombination. This study revealed predominant MABC clones in Japan and Taiwan and their relationship with the globally superadding clones in the patient community with CF. Our study provides insights into the genetic characteristics of globally dominant and area-specific strains isolated from patients with or without CF and differences between globally spread and regionally specific strains. IMPORTANCE Members of Mycobacterium abscessus complex (MABC) are frequently isolated from patients. Studies have reported that predominant clones of MABC (known as dominant circulating clones; DCCs) are distributed worldwide and transmitted from humans to humans in patients with cystic fibrosis (CF). However, associated genomic epidemiology has not yet been conducted in East Asia, including Japan and Taiwan, where there are only a few patients with CF. Using whole-genome sequencing data derived from non-CF patients in Japan and Taiwan, we revealed prevalent clones and the incidence of macrolide resistance-associated mutations in the MABC population in this region. We also clarified the associations between these predominant clones and DCCs in the global CF patient community. Our results would assist further studies in elucidating the genetic characteristics of strains isolated from patients with or without CF, the differences between globally spread and regionally specific strains, and the adaptive evolution of MABC within the host.

identified 235,540 variable positions within the core-genome. This alignment was used to infer recombination sites using Gubbins (6), and 8,358 recombinogenic sites were detected and removed from the alignment. The resulting alignment containing 240,429 recombination-free variable positions located in the core-genome region were used to reconstruct a maximum-likelihood tree using RAxML ver. 8.2.12 (7) with the General Time Reversible (GTR)-GAMMA model of nucleotide substitution and 300 bootstrap replicates. To confirm phylogeny-based subspecies identification, ANI values among all MABC isolates were calculated using fastANI with default settings (8). Multiple whole-genome alignments were used to investigate the genotypes of the erm(41) gene (MAB_2297, 2,345,955 bp to 2,346,476 bp in ATCC19977) and the rrl gene (MAB_r5052, 1,464,208 bp to 1,467,319 bp in ATCC19977) of all isolates using SeqKit (9).
After subspecies identification, the maximum-likelihood phylogeny within each ABS and MAS was constructed as described above. The complete genome sequences of ATCC19977 and JCM15300 were used as a reference, respectively. Alignments containing 76,114 and 48,718 recombination-free variable positions located on their core-genome regions (3,963,788 bp for ABS and 4,033,769 bp for MAS) were used to estimate maximum-likelihood phylogenies, respectively.
The resulting phylogenetic trees were used to identify clusters (ABS-EA and MAS-EA clusters) in Japan and Taiwan using the TreeGubbins software with options '-s (significance cut-off level) 0.01, -p (number of permutations to run to test significance) 10000' (https://github.com/simonrharris/tree_gubbins). Only detected clusters from more than one location and greater than three isolates were considered ABS-EA/MAS-EA clusters to exclude possible point source outbreaks.
To assess the relationships among Japanese (around Tokyo and Okinawa), Taiwanese, and circulating clones of ABS or MAS in a total of eight countries, we combined our data set with publicly available WGS data from 476 MABC clinical isolates (349 ABS and 127 MAS, Table   S1). These were analyzed in three previous WGS-based epidemiological studies (10)(11)(12), with one strain per patient and known cluster information to which each isolate belongs (A. Floto, J. Parkhill, and J. Bryant, personal communication). All additional raw read data were obtained from the Sequence Read Archive (SRA) and assembled de novo using the Shovill pipeline. To measure the assembly quality of these additional isolates, we calculated the genome fraction (%) of each isolate, which is the total number of aligned bases in the reference genomes (ATCC19977 or JCM15300) using QUAST software (13). We used only isolates with genome fractions greater than 85% for downstream analyses. Alignments containing 102,613 and 58,975 recombination-free variable positions located on their core-genome regions (3,584,621 bp for ABS and 3,789,583 bp for MAS) were used to estimate maximum-likelihood phylogenies, and cluster analysis was performed as described above.

Analysis of lineage-associated genes
Lineage-associated genes were identified as described previously (14). In brief, we annotated all draft genomes and complete genome sequences of the three type strains (ATCC19977, JCM15300, and BD) of MABC using DFAST_core ver. 1.0.3 with default settings (15) and Roary software with default settings (16) to compute core genes or accessory genes. We identified accessory genes that were significantly associated with lineage using the Scoary (17).

Statistics
Statistical analyses were performed with R software (www.r-project.org). The R function multinom() was used to statistically assess the differences in the composition of MABC subspecies at the three locations. The R function wilcox.exact() was used to assess the statistical significance of the differences in nucleotide diversities and pairwise SNP distances among MABC clinical isolates. Enrichments of macrolide resistance-associated mutations in ABS/MAS-EA clusters were statistically assessed using the R function fisher.test(). Figure S1. Average nucleotide identity (ANI) values among MABC clinical isolates from Japan and Taiwan. ANI values among 220 clinical isolates and three reference strains (ATCC19977, JCM15300, and BD) were measured for all strain pairs using fastANI (8). A boxplot indicates the distribution of ANI values within or between MABC subspecies. The Mann-Whitney U-test was performed to assess statistical significance between ANI values within or between the subspecies. The minimum ANI among MABC clinical isolates was 96.4%, all ANI values within the three subspecies were > 98% (minimum: 98.1%), while all ANI values between subspecies were < 98% (maximum: 97.6%). This indicates that the species and subspecies boundaries of the MABC clinical isolates were approximately 96% and 98% ANI, respectively, which is consistent with previous results (14). Figure S2. Populational comparison between ABS and MAS in Japan and Taiwan. (A) Nucleotide diversity of ABS or MAS clinical isolates in Japan and Taiwan was calculated for 10 kb nonoverlapping sliding windows of the genome alignments using the R package PopGenome (18). The Mann-Whitney U-test was used to assess the statistical significance of the differences between nucleotide diversities. (B) Comparison of pan-genome size between ABS and MAS clinical isolates in Japan and Taiwan. Orthologous gene families were identified using Roary software (16). γ denotes the parameter estimate obtained by data fitting to a power model n = κ×N^γ, where n is the number of gene families, N is the number of sampled genomes, and κ and γ are coefficients. (C) Pairwise SNP distances among clinical isolates belonging to each of six ABS-EA or five MAS-EA clusters. Whole-genome alignments containing recombination-free variable positions located in core genomes were used to calculate the SNP distances using snp-dist (https://github.com/tseemann/snp-dists). The Mann-Whitney U-test was used to assess the statistical significance of the differences in pairwise SNP distances between clinical isolates comprising the ABS-and MAS-EA clusters. Figure S3. erm(41) gene among ABS clinical isolates. Overview of recombination events in a total of 461 ABS clinical isolates. Recombination events in internal branches (red boxes) were present in multiple clinical isolates and were shared through clonal descent, while those in the terminal branches (blue boxes) were clinical isolate-specific and represent independent recent acquisitions. Phylogenetic trees were estimated as described in Fig. 4. The presence and absence of mutations in the erm(41) gene, disease status (CF or non-CF) of corresponding patients, and the region where the clinical isolate was obtained are shown as Fig. 4.